Cell Systems
○ Elsevier BV
Preprints posted in the last 30 days, ranked by how well they match Cell Systems's content profile, based on 167 papers previously published here. The average preprint has a 0.56% match score for this journal, so anything above that is already an above-average fit.
Shoyer, T. C.; Di Ventura, B.
Show abstract
Transcription factors (TFs) respond to external stimuli with time-varying changes in activity or localization (TF dynamics), driving differential transcriptional programs. Previous studies indicated that TF dynamics can be decoded at the promoter level in eukaryotes, yet a systematic understanding of robust solutions is lacking. By computationally screening over 10,000 mathematical models of multi-state promoters with various forms of TF-mediated regulation, we identify robust configurations that selectively respond to sustained ("pulse filtering") or pulsatile ("pulse boosting") TF dynamics. Promoters that activate via intermediate states and have negatively regulated deactivation robustly perform pulse filtering. In contrast, robust pulse boosting is achieved by promoters with a TF-mediated refractory state that permits short activation and recovers between pulses. Bifunctional TFs that exert activator- and repressor-like regulation extend the design space for pulse boosting. These results reveal general principles by which promoters interpret TF dynamics and suggest strategies to engineer synthetic systems to exploit them. HighlightsO_LIComputational screen of over 10,000 promoter models identifies features that enable promoters to selectively respond to sustained ("pulse filtering") or pulsatile ("pulse boosting") transcription factor (TF) dynamics. C_LIO_LIPromoters that activate via intermediate states and have negatively regulated deactivation robustly perform pulse filtering. C_LIO_LIPromoters with TF-regulated refractoriness robustly perform pulse boosting. C_LIO_LIPromoters regulated by bifunctional TFs extend the design space for pulse boosting. C_LI
Barreto, Y. B.; Jongman, E. P. H.; Patino-Ruiz, M. F.; Grundel, D. A. J.; Uysal, M.; Coenradij, J.; Poolman, B.; Heinemann, M.
Show abstract
When exposed to a nutrient, cells activate metabolism by reorganizing metabolite pools and enzyme expression to approach the maximal growth rate permitted by physicochemical constraints. While these constraints define reachable steady states, here we propose that the Gibbs energy accessible at activation further limits which states are reached. Using minimal metabolic models, we find that limited accessible Gibbs energy can trap cells in low-growth states by constraining metabolic reorganization and imposing a proteomic burden on transport and phosphorylation reactions. To investigate this experimentally, we reconstituted the arginine deiminase pathway in vesicles, revealing that the size of a conserved pool of interconverting metabolites (arginine, citrulline, and ornithine) determines accessible Gibbs energy and constrains steady-state ATP production rate, a proxy for growth. Together, these results indicate that cellular metabolism retains memory of its initial energetic state, with accessible Gibbs energy at activation acting as a thermodynamic constraint on long-term growth.
Farinas, M.; Bermudez, V.; Tsirvouli, E.; Zobolas, J.; Aittokallio, T.; Lehti, K.; Flobak, A.; Lippestad, K.
Show abstract
Effective drug combination therapies can improve cancer treatment, yet the mechanistic basis of drug synergy remains poorly understood. Most computational approaches prioritize predictive accuracy over molecular mechanistic interpretability, providing hence limited insights into how synergistic effects emerge across signalling contexts. We developed Trafikk, a molecular-signalling network-based framework that simulates drug perturbations in cell line-specific computational models to mirror functional outcomes in experimental combination screens. Across two independent large-scale datasets, Trafikk identified synergistic combinations with >77% recall. Functional response predictions revealed both conserved and context-dependent mechanisms. While AKT-MEK co-inhibition consistently disrupted coordinated survival and apoptotic signalling in 742 cell lines, PI3K-BCL2 synergy arose through distinct death programs shaped by cell-context-specific network constraints. Trafikk combines predictive performance with mechanistic interpretability, capturing how and why drug synergy emerges across cellular contexts. Source code, installation instructions and usage tutorial are freely available at https://github.com/druglogics/trafikk. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=147 SRC="FIGDIR/small/723755v1_ufig1.gif" ALT="Figure 1"> View larger version (33K): org.highwire.dtl.DTLVardef@159ca61org.highwire.dtl.DTLVardef@1f5ccecorg.highwire.dtl.DTLVardef@60d56eorg.highwire.dtl.DTLVardef@15c3021_HPS_FORMAT_FIGEXP M_FIG C_FIG
Maxian, O.; Munro, E.; Dinner, A.
Show abstract
A key question in cell biology is how cell-scale organization emerges from a given set of molecular players and rules of interaction. Given its multiscale nature, addressing this question requires a combination of experimental perturbation, mathematical modeling, and parameter inference. We leverage recent advances in each of these fields, focusing in particular on neural-network methods for simulation-based inference, to study how cell-scale patterns of Rho GTPase activity are defined by molecular-scale activator-inhibitor interactions with filamentous actin. We show that variations in F-actin assembly dynamics can be inferred directly from experimental data by combining a mathematical model with a neural network trained to associate parameter sets with data. Our neural approach differentiates data sets more precisely than traditional summary statistics, and yields a complete and robust likelihood function for each data set. Utilizing the trained network, we demonstrate how RhoGAP tunes RhoA waves via interaction with F-actin. After showing that the known functions of RhoGAP are insufficient to explain experimentally-observed dynamics, we use neural methods to infer that RhoGAP must, at a minimum, also decrease filament nucleation rates to sustain waves. Our work yields specific, experimentally-testable predictions and illustrates how a combination of traditional forward models and modern inference tools can aid in unraveling mechanisms of self-organization.
Zhu, A.; Ho, P.-Y.
Show abstract
Bacterial growth and the underlying metabolic networks are highly dissimilar across species, posing a fundamental challenge for bioengineering tasks involving diverse species. For a given species across nutrient environments, growth is regulated via proteome allocation, which gives rise to linear relationships between growth and the sizes of coarse-grained proteome sectors. However, whether and how coarse-grained growth predictors generalize across species remain unclear. Here, using genome-scale metabolic models, we discover a simple cross-species trend in which the monoculture growth of a species is proportional to the number of nutrients it utilizes, indicating that the latter is a regulatory feature that is conserved across species. By coarse-graining metabolic networks using feature learning, we identify novel proteome sectors whose sizes exhibit cross-species correlations with growth in wide-ranging experiments, suggesting that these sectors are also conserved regulatory features. We further show that the sectors enable a predictive encoding of proteome costs and growth benefits, thereby providing a potential explanation for how coarse-grained network features emerge to be simple determinants of growth across diverse metabolic networks.
Qi, S.-a.; Chapfuwa, P.
Show abstract
Predicting cellular responses to genetic or chemical perturbations across biological contexts is central to drug development and disease understanding. Despite increases in data and model scale, deep learning models have not consistently outperformed simple baselines. Leveraging causal transportability theory, we show that cross-context generalization is governed by shared causal mechanisms, not merely distributional similarity. To enable controlled evaluation, we develop a causal simulator that generates realistic semi-synthetic Perturb-seq datasets with tunable mechanistic divergence, providing benchmarks with known ground-truth causal structure. Further, we adapt the Vendi diversity score to the perturbation setting as a diagnostic for mode collapse, a failure mode invisible to standard per-perturbation metrics. Extensive experiments across four deep learning models and six simple baselines on semi-synthetic and real Perturb-seq datasets reveal a cross-context generalization gap: performance under cross-context splits drops substantially, often to simple baseline levels. Notably, even on synthetic data with fully specified causal structure, no model generalized across contexts with different causal mechanisms. These results underscore the need for cross-context evaluation, diversity-aware metrics, and mechanistically grounded inductive biases.
Kazemeini, A.; Prieto, J.; Balaji Kuttae, S.; Siokis, A.; Singh, G.; Passban, P.; Andreani, T.
Show abstract
Quantitative Systems Pharmacology (QSP) models play an inherently interventional role in pharmaceutical research and development, functioning as executable causal systems for designing, evaluating, and replacing clinical trials. However, deploying QSP as an experimental planning engine remains constrained by the difficulty of translating unstructured literature descriptions of clinical or preclinical scenarios into reproducible, simulation-ready model interventions. Motivated by this issue, we propose an agent-based framework that operationalizes QSP models as intervention-ready experimental systems by automatically extracting and executing literature-derived scenarios. The framework combines semantic grounding of model entities with a large language model (LLM)-driven Scenario Extractor and a dual-agent Scenario Mapper. Rather than relying on opaque, single-shot reasoning, our pipeline converts free-text interventions into precise parameter configurations through discrete, verifiable work orders. Moreover, our dynamic Human-in-the-Loop (HITL) strategy empowers modelers to resolve biological ambiguities interactively. Across four diverse kinetic ordinary differential equation (ODE)/QSP models and seven Subject Matter Expert (SME)-curated literature scenarios, our model resolved all selected scenarios into correct executable parameter changes, including multi-dose interventions, unit conversions, no-op scenarios, and ambiguity-triggered HITL cases, demonstrating that structured collaboration between experts and agentic systems can resolve scenarios that standalone raw Systems Biology Markup Language (SBML) reasoning LLM calls handle unreliably.
Zhang, J.; Schwartz, M. A.; Mutaher, M.; Olajide, O.; Pritykin, Y.; Ashenberg, O.; Hacohen, N.; Uhler, C.
Show abstract
Perturbations of genes with functional importance in T cells could be used to change the distribution of CD8 T cell states to enhance anti-tumor functions for cancer immunotherapies. We launched a world-wide computational challenge to predict the effects of gene perturbations and to devise objective functions for prioritizing gene perturbations that lead to desired T-cell state distributions. We supported the challenge by generating a single-cell Perturb-seq dataset profiling the effect of knocking out 73 individual expert-defined genes in T cells transferred into a mouse melanoma model. We compared the top algorithms developed by participants, and found that performance was primarily determined by the prior data used for gene feature representation, with perturbational data derived features, proving most effective. Experimental validation of the top 61 genes nominated by the algorithms revealed that perturbation of Ndufv2 and Dimt1 reached the defined objective and biased T cell differentiation toward desired states.
Landajuela, M.
Show abstract
Antibody design campaigns increasingly generate many candidates before only a small subset can be tested experimentally, making candidate filtering a central bottleneck. We study whether an autoresearch loop can discover better training-free filters for antibody binder classification by iteratively proposing rule variants, evaluating them under a fixed Leave-One-System-Out protocol, recording each experiment in version control, and using the results to guide the next iteration. Across 75 unique logged filter variants on seven antibody-antigen systems, the loop improves average ROC-AUC from 0.6371 for the initial baseline to 0.8060 for a compact final rule that we call the RMSD-Tuned Triad rule, an absolute gain of 0.1689 and a relative improvement of 26.5%. The discovered filter is competitive with supervised machine learning baselines and prompted LLM baselines evaluated on the same systems: it exceeds logistic regression (0.7144), feature-selected balanced logistic regression (0.7536), and GPT-4o tabular few-shot prompting (0.7640), and it comes within 0.0044 ROC-AUC of the strongest GPT-5 tabular few-shot result (0.8104). Unlike the LLM baseline, the final rule requires no prompted examples and no LLM inference once the numeric structure-derived features are available. These results show that systematic autoresearch can turn simple structural-confidence signals into compact, interpretable filters that are useful when target-specific training data are scarce.
Teng, D.; Qiu, Y.; Sakthivel, G.; Aranganathan, A.; Herron, L.; Tiwary, P.
Show abstract
While RNA language models (LMs) have served as foundation models (FMs) to advanced structural prediction, their evaluation relies heavily on supervised downstream tasks. Such tasks can often mask FM inefficiencies and reflect downstream training set memorization. To address this, here we introduce REDIAL (RNA Embedding perturbation Diagnostics for Language models), a zero-shot, unsupervised framework designed to extract coevolutionary signals directly from the high-dimensional latent spaces of RNA language models. By applying REDIAL, we uncover stark, layer-wise disparities in how popular RNA language models (LMs) internalize structural constraints through a layer-wise dissection and ablation study. Our results showed how such layerwise behavior deviates from protein LMs and is related to design flaws in the architectures. Specifically, we show that current RNA LMs are severely overparameterized relative to the limited sequence diversity of available RNA databases, leading to profound parameter inefficiency and overfitting. Furthermore, we establish that structure-guided pretraining fundamentally improves the signal-to-noise ratio of learned coevolutionary couplings compared to sequence-only baselines. Ultimately, this unsupervised evaluation paradigm exposes critical flaws in current parameter scaling strategies and provides a rigorous diagnostic benchmark to guide the development of more efficient, generalizable foundation models for RNA therapeutics and de novo design.
Kim, J.; Romero, P. A.
Show abstract
Large language models (LLMs) are increasingly deployed as agents for scientific discovery, but standardized frame-works for evaluating their performance and behaviour in scientific workflows are lacking. Protein design provides a demanding test case because modern workflows combine stochastic generative models, structure prediction systems, and physics-based evaluation tools that require extensive candidate exploration and filtering. Here we introduce BioDesignBench, a benchmark of 76 expert-curated protein design tasks spanning antibodies, enzymes, fluorescent proteins, binders, and scaffolds, together with human and non-LLM baselines and behavioural metrics derived from tool-use traces. We evaluate four frontier LLM agents across diverse protein design workflows and find that the strongest agents surpass deterministic hardcoded pipelines but consistently underperform expert practice. Although agents generally select appropriate tools, they evaluate candidate designs too shallowly, rarely compare alternatives, and terminate exploration prematurely. Guided workflows improve tool coverage but not evaluation depth. Enforcing deeper multi-metric evaluation substantially improves agent performance, demonstrating that these limitations are behavioural rather than fundamental capability constraints. We release BioDesignBench, open-source reference agents, and a public leaderboard as a community resource for evaluating and improving AI agents for protein engineering.
Calvanese, F.; Lombardi, G.; Weigt, M.; FERNANDEZ-DE-COSSIO-DIAZ, J.
Show abstract
Protein language models (pLMs) leverage large-scale evolutionary data to generate novel sequences, but steering generation toward desired physicochemical properties without sacrificing diversity remains a major challenge. Existing approaches often induce severe diversity loss or require computationally expensive retraining. We introduce Iterative Lookback Monte Carlo (ILMC), a training-free inference-time sampling strategy that interleaves autoregressive elongation with Metropolis-Hastings refinement to approximate sampling from a maximum-entropy target distribution balancing generative quality and steering objectives. We show theoretically that this target distribution is entropy-maximizing under fixed generative quality and steering constraints, and empirically that ILMC produces more diverse samples than standard autoregressive baselines at matched generative quality. Using simple steering potentials, ILMC improves desired molecular properties, including generating proteins with up to 12{degrees}C higher predicted melting temperature than compute-matched alternative strategies. ILMC naturally applies to classifier-guided steering, where it outperforms purely autoregressive guidance in diversity while maintaining comparable enrichment of target properties. We validate ILMC on family-specific pLMs and on the multi-family model ProGen3.
Wang, M.; Yuan, M.; Vasilakos, A. V.; He, Y.; Ren, Z.
Show abstract
Protein language models (PLMs) like the ESM series encapsulate immense evolutionary knowledge within their high-dimensional continuous embeddings. However, these latent representations are densely entangled, obscuring the fine-grained biophysical constraints necessary for precise functional resolution. To unlock the full expressive power of these embeddings, we propose PLM-SAE, a mechanistic framework that employs Sparse Autoencoders (SAEs) to disentangle PLM representations into discrete, biologically interpretable activations. By isolating and directly intervening on critical functional features, we fundamentally enhance the structural and mutational awareness of the underlying embeddings. We rigorously validate this embedding enhancement on variant effect prediction (VEP). In the unsupervised zero-shot setting, our sparse modulation elevates the state-of-the-art ESM-3 model, yielding performance improvements across 114 deep mutational scanning datasets and delivering an 80.8% relative improvement on challenging targets like the human E3 ubiquitin ligase HECD1. Furthermore, our target-specific differentiable gating mechanism achieves consistent performance gains in over 80% of evaluated datasets with an average Spearman{rho} increase of +0.138. Finally, extending this approach to a cross-fitness multitask architecture establishes new state-of-the-art results on 17 VenusMutHub datasets, highlighted by a 169.0% performance surge in small-molecule binding predictions. Our work demonstrates that refining the highly entangled latent manifold via sparse modulation provides a robust and generalizable foundation for enhancing downstream PLM capabilities.
Ellington, C. N.; Addagudi, S.; Wang, J.; Lengerich, B. J.; Xing, E. P.
Show abstract
Virtual screening methods prioritize therapeutic candidates by predicting molecular properties and interactions. However, molecular models are insufficient to predict higher-order effects that arise in real biological systems, leading to late-stage failures in drug discovery. Virtual cells have been posed as a solution to this problem by predicting gene expression responses to drugs, but they remain weakly validated as screening tools; gene expression is only an intermediate in understanding drug success or failure. Despite burgeoning progress in virtual cells, some basic questions remain. Is expression even a good representation of higher-order drug effects? How can expression and other cell-level representations be applied to prioritize therapeutic candidates? Can cell-level methods be fairly compared against traditional molecular-level screens? We address these questions in a two-pronged approach. First, we curate two benchmarks, Drug-Disease Retrieval Bench (DDR-Bench) and Drug-Target Retrieval Bench (DTR-Bench), which directly compare cell-level methods against traditional molecular methods on canonical drug discovery tasks. DDR-Bench evaluates a methods ability to prioritize disease indications for drugs with novel target profiles. DTR-Bench evaluates a methods ability to reconstruct drug-target interactions from separate perturbation modalities that act on shared mechanisms, bridging the gap between cell-level methods and classic molecular screens. We identify shortcomings of existing screening methods on these benchmarks, and propose an alternative representation of drug effects: perturbed gene networks. Inferring post-perturbation gene networks on-demand for unseen drugs requires methods that generalize beyond traditional plug-in network estimators. We develop a scalable differentiable surrogate loss for multivariate Gaussians, which we apply to train a context-adaptive amortized estimator that maps perturbation metadata to gene-gene dependency network parameters. The resulting model, CellVS-Net, achieves SOTA on predicting how gene networks restructure under a variety of complex multivariate experimental conditions, including different cell types, small molecule therapeutics, signaling molecules, gene knockdowns, and gene over-expressions. When compared to other molecular and cell-level representations of drugs, we find that CellVS-Net achieves SOTA on both virtual screening benchmarks. Overall, CellVS-Net demonstrates that cell-level virtual screening methods are a viable alternative to molecular screening, and associated benchmarks enable hill-climbing on relevant drug discovery tasks.
Leaf, C. M.; Qi, P.; Gandhi, Y. P.; Jalali-Yazdi, F.; Ong, J. N.; Takahashi, T. T.; Kalia, R.; Roberts, R. W.
Show abstract
In vitro selection and directed evolution technologies such as mRNA display, explore large libraries ([≥]1014 variants) and generate thousands to millions of functional polypeptide ligands to a variety of targets. Denoising diffusion implicit machine learning models (DDIMs) trained using display-derived deep sequencing data can greatly expand these functional sequences beyond what is accessible experimentally. However, methods are needed to predict peptide properties such as binding free energies ({Delta}G{degrees}). Here, we applied machine learning methods to predict binding free energies of both experimental and DDIM-generated peptide ligands against a target of interest, the oncogenic protein Bcl-xL. To do this, we trained a Closed-form Continuous (CfC) neural network using a dataset of 15,700 peptide ligands where pairs of sequences and their corresponding binding free energies ({Delta}G{degrees}) were used as inputs. This type of model was chosen due to its ability to represent irregular series. The resulting CfC model accurately predicts the rank order, within error, and binding free energies ({Delta}G{degrees}) for both experimental and DDIM-generated peptides, identifying five DDIM-generated peptides with single-digit picomolar affinities. Combining trained DDIM and CfC models offers a unified route to expand the scope of experimental ligand discovery, predict the molecular properties of both experimental and generated ligands, and highlights the utility of large quantitative datasets for making accurate in silico predictions of high-affinity peptide candidates. StatementHigh-throughput sequencing analysis of mRNA display libraries enables generating novel peptide ligands and expands the scope of functional sequences beyond what is accessible experimentally. Closed-form Continuous neural networks trained using sequences and their corresponding free energies accurately predict the binding free energies of both experimental and machine learning-generated peptides, enabling a route to quantitatively predict peptide properties using directed evolution data.
Hughes, O.; Foley, G.; Balderson, B.; Piper, M.; Boden, M.
Show abstract
Robust and reproducible results are essential for confident scientific analysis. We demonstrate that transcription factor (TF) Chromatin Immunoprecipitation coupled with sequencing (ChIP-seq) suffers from systematic bias that may threaten its reproducibility: 80% of 200+ condition-matched, dual-replicate experiments in ENCODE contain genomic regions of systematic bias. We observe this regional bias even between replicates produced within the same experiment, resulting in thousands of unreplicated peaks, which often contain valuable biological data. We provide evidence that regional bias may lead to qualitative differences in TF biology inferred by different experiments; we discovered eight TFs with binding activity in compact chromatin that was identified by one experiment, yet systematically absent from others. To mitigate the effects of bias, we derive simple but effective metrics to quantify the quality of data within biased regions and demonstrate that they can be used for the robust integration of data from multiple experiments.
Marsalkova, E.; Simecek, P.
Show abstract
Multimodal protein language models have transformed protein design, yet their capacity to capture complex topological features remains poorly understood. We use knotted proteins, rare structures in which the backbone forms a nontrivial topological knot, as a test case to probe this capacity using ESM3, a generative protein language model. ESM3s guided generation produces knotted proteins with an 89% success rate (95% CI: 81-94%), compared to [~] 0.5% for unguided diffusion-based approaches. Knot topology is remarkably robust to sequence perturbation: on average 84% of the protein sequence must be altered before the knot breaks, and the loss follows a sharp threshold rather than gradual degradation. Strikingly, structural drift accumulates well before topological disruption, suggesting that topology is more robust than specific three-dimensional arrangement. Generated proteins show no close sequence similarity to known knotted proteins, arguing against simple memorization. These findings have implications for protein engineering and, more speculatively, for discussions of biosecurity in the era of generative biological AI.
Pan, X.; Saunders, R.; Replogle, J. M.; Weissman, J. S.; Zhuang, X.
Show abstract
Understanding how cell states change in response to genetic perturbations is critical for gene-function and therapeutics discovery. However, state-of-the-art deep-learning models trained on large single-cell omics datasets still struggle to accurately predict cellular responses to perturbations, highlighting the need for a better understanding of the cell-state space and how cells move through this space. Here, we present a contrastive learning model that integrates diverse scRNA-seq datasets into a global, interpretable cell-state manifold. We further develop a framework to integrate this global cell-state manifold with genome-scale perturbation data to identify gene-expression programs that define principal axes of cell-state transitions and functional embeddings of genes that define major perturbation classes. Applying this framework across Perturb-seq datasets on different cell types reveals conserved cellular responses to perturbations, as well as cell-type-specific rewiring of stress responses. Moreover, we perform a genome-scale Perturb-seq screen in human embryonic stem cells, validating and extending these findings and uncovering a class of mesenchymal transitions induced by diverse perturbations to cellular stress-response pathways.
Osman, T. O.; Rios, K. I.; Hart, A.; Shin, S.-y.; Nguyen, L. K.
Show abstract
Sequential drug combinations can significantly enhance therapeutic efficacy, yet the general principles governing when and why sequential administration outperforms concurrent treatment remain poorly understood. While empirical evidence demonstrates that the order and timing of drug exposure can be critical, a mechanistic framework to predict which regulatory architectures are primed for sequential benefit is currently lacking. Here, we systematically enumerated and dynamically analysed 59,040 four-node network topologies to identify the structural design principles that dictate sequential efficacy. Our analysis reveals that only a small fraction of network architectures robustly confer a sequential advantage and identifies a minimal structural requirement for this benefit: a positive feedback loop between the primary drug target and its downstream oncogenic output, coupled with antagonistic crosstalk from a secondary drug target. We demonstrate that this architecture enables bistability, allowing the first drug to reconfigure the network into a suppressed attractor state that is inaccessible through concurrent administration. The treatment schedule determines which of two coexisting stable states the system ultimately occupies, with the gap time between doses defining a critical therapeutic window. Only when the first drug is given sufficient time to displace the system past a threshold does the sequential regimen achieve superior suppression. Our findings establish bistability-enabling network motifs as predictive determinants of sequential drug efficacy and provide a topology-based framework for the rational design of time-dependent combination therapies.
Gallo, H.; Bucci, V.
Show abstract
Forecasting how microbiome-host ecosystems evolve through time simultaneously at the compositional and functional level remains a central challenge in biology. While dynamical systems models (DSMs) can infer and predict community composition from longitudinal abundance data, and constraint-based metabolic models (CBMMs) can estimate metabolic fluxes from genome-scale reconstructions, no existing framework unifies these approaches to generate mechanistically grounded, time-resolved forecasts of both microbial abundances and metabolite dynamics from ecological data alone. Here, we introduce the Dynamical Systems Constrained Metabolic Modeling (DySCoMeMo) framework, a new hybrid computational pipeline that integrates ecological DSMs with CBMMs to predict temporal dynamics of biomass and metabolites across microbial communities and hosts. DySCoMeMo leverages parameters inferred from application of DSMs to microbiome time series data to constrain metabolic modeling over time, thereby bridging ecological interaction networks with genome-scale metabolic modeling. DySCoMeMo is able to predict future community and metabolite dynamics in vitro with accuracy that is superior or on-par compared to that achieved with established methods that require actual microbial abundances and/or metabolites data for metabolite network inference or for estimating the per-microbe contribution to the extracellular metabolic pool. DySCoMeMo also generalizes to in vivo data as it is capable of accurately forecasting microbial and metabolite dynamics in response to dietary perturbations even when host metabolism is included. Finally, DySCoMeMo uniquely enables the identification of keystone species by quantifying their contributions to sustaining metabolic environments. Together, our work establishes a generalizable, mechanistically grounded framework for time-resolved forecasting of microbiome-host microbial and metabolic dynamics, bridging ecological interaction inference with genome-scale metabolism of communities.